Methods for generating a distribution of optimal solutions to nondeterministic polynomial optimization problems

ABSTRACT

The present invention overcomes problems in prior art DNA-based computing methods for solving non-deterministic polynomial optimization problems, by providing methods that derive the most probable answers in a statistically significant manner that makes the methods scalable with increases in the number of data inputs, and thus makes the methods practical.

RELATED APPLICATIONS

This application is a continuation in part of international applicationPCT/US2007/002331 with an international filing date of Jan. 29, 2007,which claims priority to U.S. Provisional Patent Application Ser. No.60/762,971 filed Jan. 27, 2006, both of which are incorporated byreference herein in their entirety.

STATEMENT OF GOVERNMENT SUPPORT

The work disclosed herein was supported, at least in part, by DARPA/DSOgrant AF9550-05-1-0424, and thus the U.S. government may have certainrights in the invention.

BACKGROUND OF THE INVENTION

Computing by representing information in the form of DNA base sequencesenjoys several potential advantages over silicon-based computingmethods, due to the massive parallelism of the biochemical reactions onDNA molecules. These advantages include significantly enhancedprocessing speeds, significantly reduced energy consumption, andsignificantly greater storage capacity. DNA computing can solve problemsintractable by conventional computing methods including but not limitedto organization of mass evacuations, organization of response toinvasion, supply chain problems, and computer chip assembly problems. Asa result, there is tremendous interest in utilizing the computingcapacity of DNA

A “nondeterministic polynomial optimization problem” is a class ofoptimization problems for which no efficient solution algorithm has beenfound. Tractable problems can be solved by computer algorithms that runin polynomial time; i.e., for a problem of size n, the time or number ofsteps needed to find the solution is a polynomial function of n. Anoptimization problem is called NP (nondeterministic polynomial) if itssolution (if one exists) can be guessed and verified in polynomial time;nondeterministic means that no particular rule is followed to make theguess. Such “NP optimization problems” of any complexity thus pose adifficult computing issue, as previous attempts to solve using DNA-basedcomputing require generation of the complete solution set. Thus,improved DNA-based computing methods for solving NP optimizationproblems are needed in the art.

SUMMARY OF THE INVENTION

In one aspect, the present invention provides methods for generating adistribution of optimal solutions to a nondeterministic polynomialoptimization problem comprising:

(a) providing n input polynucleotides, wherein n equals a number of datainputs, and wherein each input polynucleotide:

-   -   (i) represents a unique data input; and    -   (ii) comprises an x segment and a y segment;

(b) providing z connection polynucleotides, wherein z equals a number ofunique connections that can be made between the different data inputs,and wherein each polynucleotide in the set of connection polynucleotidesis complementary to the x segment of one input polynucleotide and tothey segment of one different input polynucleotide;

(c) combining the input polynucleotides with the connectionpolynucleotides to form a mixture, wherein the combining is done underconditions to promote formation of hybridization complexes betweencomplementary polynucleotides, and wherein each individual connectionpolynucleotide is added at a concentration based on a weight assigned tothe individual connection polynucleotide;

(d) ligating input polynucleotides that are present in a hybridizationcomplex to form ligation products; and

(e) determining a concentration of the ligation products, wherein theligation products present at the highest concentration represent optimalsolutions to the nondeterministic polynomial optimization problem.

In one embodiment, step (c) comprises:

(i) combining all input polynucleotides only with those connectionpolynucleotides that are complementary to the x or y segment of astarting input polynucleotide to form a mixture, wherein the combiningis done under conditions to promote hybridization between complementarypolynucleotides to form a first hybridization complex between thestarting input polynucleotide, a second input polynucleotide, and oneconnection polynucleotide; and

(ii) combining all remaining connection polynucleotides not combined instep (i) with the mixture, wherein the combining is done underconditions to promote hybridization between complementarypolynucleotides, wherein the remaining connection polynucleotides areadded at a concentration based on a weight assigned to each individualremaining connection polynucleotide.

In another embodiment, step (c) comprises:

(i) combining all input polynucleotides only with those connectionpolynucleotides that are complementary to the x or y segment of at leasttwo, but less than all, of the input polynucleotides, to form a mixture,wherein the combining is done under conditions to promote formation ofhybridization complexes between complementary polynucleotides, andwherein each individual connection polynucleotide is added at aconcentration based on a weight assigned to the individual connectionpolynucleotide; and

(ii) combining all remaining connection polynucleotides not combined instep (i) with the mixture, wherein the combining is done underconditions to promote hybridization between complementarypolynucleotides, wherein the remaining connection polynucleotides areadded at a concentration based on a weight assigned to each individualremaining connection polynucleotide.

In various further embodiments described in more detail below, step(c)(ii) is repeated a desired number of times; the input polynucleotidesare present in saturating concentration relative to the connectionpolynucleotides; determining a concentration of the ligation productscomprises determining a length of the ligation products; the methodcomprises purifying those ligation products that contain each inputpolynucleotide prior to determining a concentration of the ligationproducts; determining a concentration of the ligation products comprisesdetermining an order of polynucleotides in the ligation products; thedetecting produces a reduced distance matrix with nonzero values onlyfor those ligation products that exist in an optimal answer set; and thenondeterministic polynomial optimization problem is selected from thegroup consisting of evacuation planning, invasion response planning,supply chain problems, computer chip assembly, shortest path problems,graph theory problems, network design problems, sets and partitionsproblems, storage and retrieval problems, sequencing and schedulingproblems, mathematical programming problems, algebra and number theoryproblems, and program optimization problems.

In another aspect, the present invention provides computer readablestorage medium comprising a set of instructions for causing a processingdevice to execute procedures for carrying out the methods of theinvention.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1(A) General schematic of the traveling salesman problem; (B)example of how cities are defined as sequences; (C) List of pathwaysthat exist in the graph; (D) How the pathway sequences bind to the citysequences; (E) How the ligation extends the sequences to form answers;(F) Example of a solution.

FIG. 2 shows the product of the ligation step.

FIG. 3 shows cartoon of page gel after answers are formed. The arrowpoints to the correct size band that will be excised.

FIG. 4 shows the procedure used to purify out any solutions that havevisited the same city multiple times.

FIG. 5 shows the LCR method that is used to read out the final solution.

FIG. 6 is a PAGE profile of ligation chain reaction product. For eachLCR, the 230-mer DNA solutions plus two pairs of probes were included.If the solutions are readout using ATPase motor detection technique, twoprobes need to be biotinylated.

FIG. 7 shows an exemplary flow chart for an automated embodiment of themethods of the invention.

FIG. 8 shows an example of city B, city C and pathway BC that connectsthem.

FIG. 9 shows an exemplary two sequences that represented starting cityA_(s) and ending city A_(e) separately designed to replace the singlecity A sequence. The second-half and the first-half of the original cityA sequence were used in the new starting and ending sequencesrespectively.

FIG. 10( a) (A-C) is an exemplary denaturing polyacrylamide gel forpurification of bands corresponding to DNA encoding round-trip routesentering exactly 16 cities (25-mer city A_(s) and A_(e), 20-mer other 14cities).

FIG. 10( b) presents the order of cities in the single band shown inFIG. 9( a)(c) that start and stop at city A and also include each of theother 14 cities once and only once.

FIG. 11 (A-O) PAGE profile of 40mer LCR products for every possible citypairing. The letter on the left of each gel indicates the preceding cityof the pair, where each lane tests the relative abundance of thespecific city pairing in the answer sequences indicated by the letterabove the lane. Arrows indicate the most abundant product band on eachgel. Each lane contains excess probes, which appear as a 20mer band.

DETAILED DESCRIPTION OF THE INVENTION

The present invention overcomes problems in prior art DNA-basedcomputing methods for solving NP optimization problems, by providingmethods that derive the most probable answers in a statisticallysignificant manner that makes the methods scalable with increases in thenumber of data inputs, and thus makes the methods practical.

In one aspect, the present invention provides methods for generating adistribution of optimal solutions to a nondeterministic polynomialoptimization problem comprising:

(a) providing n input polynucleotides, wherein n equals a number of datainputs, and wherein each input polynucleotide:

-   -   (i) represents a unique data input; and    -   (ii) comprises an x segment and a y segment;

(b) providing z connection polynucleotides, wherein z equals a number ofunique connections that can be made between the different data inputs,and wherein each polynucleotide in the set of connection polynucleotidesis complementary to the x segment of one input polynucleotide and tothey segment of one different input polynucleotide;

(c) combining the input polynucleotides with the connectionpolynucleotides to form a mixture, wherein the combining is done underconditions to promote formation of hybridization complexes betweencomplementary polynucleotides, and wherein each individual connectionpolynucleotide is added at a concentration based on a weight assigned tothe individual connection polynucleotide;

(d) ligating input polynucleotides that are present in a hybridizationcomplex to form ligation products; and

(e) determining a concentration of the ligation products, wherein theligation products present at the highest concentration represent optimalsolutions to the nondeterministic polynomial optimization problem.

A “nondeterministic polynomial optimization problem” is a class ofoptimization problems for which no efficient solution algorithm has beenfound. (see, for example, www.cs.auc.dk/˜luca/FS2/NP-completeness.html)Tractable problems can be solved by computer algorithms that run inpolynomial time; i.e., for a problem of size n, the time or number ofsteps needed to find the solution is a polynomial function of n. Anoptimization problem is called NP (nondeterministic polynomial) if itssolution (if one exists) can be guessed and verified in polynomial time;nondeterministic means that no particular rule is followed to make theguess. Examples of such NP optimization problems are described below.

As used herein, a “distribution of optimal solutions” means one or morebest solutions from the set of possible solutions. In one embodiment,the method identifies the most optimal solution, wherein the mostoptimal solution is the one represented by the ligation product presentat the highest concentration. In a further embodiment, the methodidentifies the most probable answer, and the method further comprisesdetermining the most optimal solution to the problem from this subset ofmost probable answers using conventional computing methods.

As used herein, “polynucleotide” means DNA, RNA, peptide nucleic acids(“PNA”), and locked nucleic acids (“LNA”), nucleic acid-like structures,as well as combinations thereof and analogues thereof. Nucleic acidanalogues include known analogues of natural nucleotides which havesimilar or improved binding properties. “Analogous” forms of purines andpyrimidines are well known in the art, and include, but are not limitedto aziridinylcytosine, 4-acetylcytosine, 5-fluorouracil, 5-bromouracil,5-carboxymethylaminomethyl-2-thiouracil,5-carboxymethylaminomethyluracil, inosine, N6-isopentenyladenine,1-methyladenine, 1-methylpseudouracil, 1-methylguanine, 1-methylinosine,2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine,5-methylcytosine, N6-methyladenine, 7-methylguanine,5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil,beta-D-mannosylqueosine, 5-methoxyuracil,2-methylthio-N-6-isopentenyladenine, uracil-5-oxyacetic acidmethylester, pseudouracil, queosine, 2-thiocytosine,5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil,uracil-5-oxyacetic acid, and 2,6-diaminopurine. The oligonucleotides mayalso comprise nucleic acid backbone analogues including, but not limitedto, phosphodiester, phosphorothioate, phosphorodithioate,methylphosphonate, phosphoramidate, alkyl phosphotriester, sulfamate,3′-thioacetal, methylene(methylimino), 3′-N-carbamate, morpholinocarbamate, and peptide nucleic acids (PNAs), methylphosphonate linkagesor alternating methylphosphonate and phosphodiester linkages(Strauss-Soukup (1997) Biochemistry 36:8692-8698), and benzylphosphonatelinkages, as discussed in U.S. Pat. No. 6,664,057; see alsoOligonucleotides and Analogues, a Practical Approach, edited by F.Eckstein, IRL Press at Oxford University Press (1991); AntisenseStrategies, Annals of the New York Academy of Sciences, Volume 600, Eds.Baserga and Denhardt (NYAS 1992); Milligan (1993) J. Med. Chem.36:1923-1937; Antisense Research and Applications (1993, CRC Press).

The oligonucleotides may also contain analogous forms of ribose ordeoxyribose as are well known in the art, including but not limited to2′ substituted sugars such as 2′-O-methyl-, 2′-fluoro- or2′-azido-ribose, carbocyclic sugar analogs, α.-anomeric sugars, epimericsugars such as arabinose, xyloses or lyxoses, pyranose sugars,sedoheptuloses, acyclic analogs and abasic nucleoside analogs such asmethyl riboside. The oligonucleotides may also contain TNA (threosenucleic acid; also referred to as alpha-threofuranosyl oligonucleotides)(See, for example, Schong et al., Science 2000 Nov. 17, 290(5495):1347-1351.)

The oligonucleotides may also comprise nucleic-acid-like structures withsynthetic backbones. DNA backbone analogues provided by the inventioninclude phosphodiester, phosphorothioate, phosphorodithioate,methylphosphonate, phosphoramidate, alkyl phosphotriester, sulfamate,3′-thioacetal, methylene(methylimino), 3′-N-carbamate, morpholinocarbamate, and peptide nucleic acids (PNAs), methylphosphonate linkagesor alternating methylphosphonate and phosphodiester linkages(Strauss-Soukup (1997) Biochemistry 36:8692-8698), and benzylphosphonatelinkages, as discussed in U.S. Pat. No. 6,664,057; see alsoOligonucleotides and Analogues, a Practical Approach, edited by F.Eckstein, IRL Press at Oxford University Press (1991); AntisenseStrategies, Annals of the New York Academy of Sciences, Volume 600, Eds.Baserga and Denhardt (NYAS 1992); Milligan (1993) J. Med. Chem.36:1923-1937; Antisense Research and Applications (1993, CRC Press).

As used herein, “data inputs” are data points to be considered whensolving any particular NP optimization problem. Each data input isrepresented by a unique input polynucleotide, and thus the number ofinput polynucleotides is equal to the number of unique data inputs. Thelimit on the number “n” of data inputs that can be processed using themethods of the invention is only limited by the number of uniquesequences that can be designed.

As used herein, “x segment and y segment” are different portions of theinput polynucleotides designed to provide for the required hybridizationwith the connection polynucleotides (ie, each polynucleotide in the setof connection polynucleotides is complementary to the x segment of oneinput polynucleotide and to the y segment of one different inputpolynucleotide). See, for example, FIG. 1D (input polynucleotidesreferred to as “cities” and connection polynucleotides referred to as“pathways” for purposes of the traveling salesman problem exemplified inthe figure). There are no specific sequence requirements associated withan “x segment” versus a “y segment.”

In one embodiment, each of the input polynucleotides and/or theconnection polynucleotides is the same or similar length, and the x andy segments of each polynucleotide are of the same length. In anotherembodiment, the input polynucleotides and/or the connectionpolynucleotides are not the same length, but have similar meltingtemperatures. The input polynucleotides and/or the connection cancomprise the x and y segments, or can consist of the x and y segments.In embodiments where the input polynucleotides and/or the connectionpolynucleotides comprise the x and y segments, the input polynucleotidesand/or the connection polynucleotides can contain additional nucleotideseither between the x and y segments, or at one or both termini of theinput polynucleotide, most preferably where any additional nucleotidesfor the input polynucleotides are between the x and y segments, andwhere any additional nucleotides for the connection polynucleotides areat one or both termini of the connection polynucleotide. In onenon-limiting example, additional nucleotides (such as GC-richnucleotides) can be added at the terminus of one or more inputnucleotides to increase their melting temperature relative to otherinput polynucleotides; this may be desirable, for example, inembodiments where a start and/or end input polynucleotide are known apriori (for example, in generating answers to the traveling salesmanproblem, as disclosed below), in order to reduce inappropriate insertionof starting and/or ending input polynucleotides inside the solutions.

The number of connection polynucleotides (“z”) is equal to the number ofunique connections that can be made between the different data inputs.The number of unique connections is equal to n*(n−1). The term“connection” refers to a link between data inputs.

Each individual connection polynucleotide is added to the mixture at aconcentration based on a weight assigned to the individual connectionpolynucleotide. The weight assigned to individual connectionpolynucleotides differs depending on the NP optimization problem beingsolved and the specifics of the data inputs employed. Since the methodsof the invention rely upon finding the optimal answer by identifying thesequences with the highest concentration, the method is fundamentally amaximization technique. Thus to solve a minimization problem it is mostpreferred to use a weighting function that inverses the relativeweights. Any 1:1 function mapping of the parameters for a given inputpolynucleotide to sequence concentration of that input polynucleotide isacceptable.

For example, the weight assigned to a path between two cities may merelybe a function of the distance between them. However, it could alsoinclude factors like the road conditions or the differentialaccessibility to fueling stations on different roads. As will beapparent to those of skill in the art, many such factors could influencethe weight assigned to a specific connection polynucleotide.Identification of such factors, and determining how to weigh theconnection polynucleotide based on those factors, is well within thelevel of those of skill in the art, based on the teachings herein. Forexample, for a shipping problem, the weight could be the cost ofshipping plus the time of delivery, plus the number of times the packageneeds to be handled. For a map-covering problem, the weight could becalculated from the number of edges between vertices. Furthernon-limiting examples of weights that could be used are provided below.

As a result of the weighting, the methods of the invention generatesolutions in concentrations proportional to their optimality. Themethods thus find an optimal subset of all possible solutions. As aresult, many poor solution sequences may not be formed since they occurwith such low probability, and the methods result in an optimal subsetof the possible solutions.

The method comprises combining the input polynucleotides with theconnection polynucleotides to form a mixture, wherein the combining isdone under conditions to promote formation of hybridization complexesbetween complementary polynucleotides. In a preferred embodiment, theinput polynucleotides are added in saturating concentration. In afurther embodiment, an automatic fluidic handling system is used toimprove precision in amounts of the polynucleotides mixed together.

The specific hybridization conditions used will depend on the length ofthe polynucleotide probes employed, their GC content, as well as variousother factors as is well known to those of skill in the art. (See, forexample, Tijssen (1993) Laboratory Techniques in Biochemistry andMolecular Biology—Hybridization with Nucleic Acid Probes part I, chapt2, “Overview of principles of hybridization and the strategy of nucleicacid probe assays,” Elsevier, N.Y. (“Tijssen”)).

The methods of the invention comprise ligating input polynucleotidesthat are present in a hybridization complex to form ligation products.An exemplary hybridization complex includes a single connectionpolynucleotide base-paired with an x segment from one inputpolynucleotide and also base paired with a y segment from a differentinput polynucleotide, thus juxtaposing two different inputpolynucleotides, which can then be ligated. The ligation can beaccomplished by techniques known to those of skill in the art usingcommercially available nucleic acid ligases. Any nucleic acid ligase(depending on the nature of the polynucleotide) is suitable for use inthe disclosed methods of the invention. Preferred ligases are those thatpreferentially form phosphodiester bonds at nicks in double-strandedDNA. That is, ligases that fail to ligate the free ends ofsingle-stranded DNA at a significant rate are preferred. Thermostableligases are especially preferred. Many suitable ligases are known, suchas T4 DNA ligase (Davis et al., Advanced Bacterial Genetics—A Manual forGenetic Engineering (Cold Spring Harbor Laboratory, Cold Spring Harbor,N.Y., 1980)), E. coli DNA ligase (Panasnko et al., J. Biol. Chem.253:4590-4592 (1978)), AMPLIGASE™ (Kalin et al., Mutat Res.,283(2):119-123 (1992); Winn-Deen et al., Mol Cell Probes (England)7(3):179-186 (1993)), Taq DNA ligase (Barany, Proc. Natl. Acad. Sci. USA88:189-193 (1991), Thermus thermophilus DNA ligase (AbbottLaboratories), Thermus scotoductus DNA ligase and Rhodothermus marinusDNA ligase (Thorbjarnardottir et al., Gene 151:177-180 (1995)).

In one embodiment, the combining in step (b) of the methods of theinvention comprises:

(i) combining all input polynucleotides only with those connectionpolynucleotides that are complementary to the x or y segment of astarting input polynucleotide to form a mixture, wherein the combiningis done under conditions to promote hybridization between complementarypolynucleotides to form a first hybridization complex between thestarting input polynucleotide, a second input polynucleotide, and oneconnection polynucleotide; and

(ii) combining all remaining connection polynucleotides not combined instep (i) with the mixture, wherein the combining is done underconditions to promote hybridization between complementarypolynucleotides, wherein the remaining connection polynucleotides areadded at a concentration based on a weight assigned to each individualremaining connection polynucleotide.

This embodiment is particularly useful for NP optimization problemswhere a starting data input is known, including but not limited to theshortest path problem, where a starting point in the path is known (seebelow).

In further examples of this embodiment, step (i) can comprise addingonly those connection polynucleotides that are complementary to the x ory segment of two, three, or more, but less than all, of the inputpolynucleotides, followed by step (ii) as noted above. In a furtherembodiment, step (i) can comprise adding all but the ending inputpolynucleotide, where the ending input polynucleotide is known a priori(as, for example, in the traveling salesman problem).

In a further embodiment of each of these embodiments, step (ii) isrepeated a desired number of times; in a further preferred embodiment,step (ii) is repeated (n−2) times.

In a further embodiment of each of these embodiments, step (i) iseliminated and step (ii) is repeated a desired number of times.

The methods of the invention also comprise determining a concentrationof specific ligation products, wherein the ligation products present atthe highest concentration represent optimal solutions to thenondeterministic polynomial optimization problem. This step involvesidentification of specific ligation products (ie: “solutions”), forwhich a concentration is then determined; ligation products present atthe highest concentration represent optimal solutions to the problem.Such identification involves determining the order of oligonucleotidesin the ligation product. Any method for determining the order of thepolynucleotides in a given ligation product can be used, including butnot limited to nucleic acid sequence analysis, polymerase chainreaction-based techniques, photoelectrochemical detection (for example,Gao and Tanzil, Nucleic Acids Res. 2005 Aug. 1; 33(13):e123.); quantumdot labeling (for example, Crut et al., Nucleic Acids Res. 2005 Jun. 20;33(11):e98); multiplexed microsphere-based suspension array platforms(Dunbar, Clin Chim Acta. 2006 January; 363(1-2):71-82. Epub 2005 Aug.15.); molecular beacons (Tsourkas et al., Nucleic Acids Res. 2003 Feb.15; 31(4):1319-30.); molecular semaphores (WO 2005/080603); Bio-bar codePCR (for example, Nam et al., J Am Chem Soc. 2004 May 19;126(19):5932-3), and DNA footprinting. In one embodiment a 3′biotinylated primer that is complementary to an input sequence is mixedwith a 5′ biotinylated primer that is complementary to another input.Ligase is used to link the two biotinylated primers together to form adetection complex, which can be detected using a molecular semaphoredevice, as disclosed in WO 2005/080603, incorporated by reference hereinin its entirety. The antiparallel nature of DNA combined with the 3′ and5′ labels allow the order of the input sequences to be determined.

Every possible input pairing (where an input pairing is a subset of asolution) is tested, n²-n pairings, which results in a matrix withnonzero values only for those input pairings that exist in the optimalanswer set, such that the more probable the pairing the higher theconcentration. (See, for example, the Tables in the Examples) Thus,where no input pairings are detected, they will register as a “zero”value in the matrix.

The result is a reduced distance matrix with a searchable number ofsolutions (ie: ligation products), ˜n³ to n⁴.

In a further embodiment, the sum of each row and column are equal, basedon normalizing the amounts against the variation in the initial amountof probes added as well as variation in the gels, allowing the optimalsolutions to be found by existing algorithms. A preferred embodiment isto use a maximum likelihood algorithm where the largest numbers arechosen sequentially to find the optimal solution.

Thus, all input pairings (ie: input polynucleotides plus connectionpolynucleotides) are possible, since all of the relevant polynucleotidesare added, but that not all of the combinations will form; thus, the“reduced distance matrix” has zero values where that particular solutionwas not detected.

As will be understood by those of skill in the art, a reduced distancematrix is just one possibility for displaying results; any othersuitable display technique can be used, including but not limited tosimple lists or as the values between ordered pairs.

A preferred embodiment uses software to analyze the reduced distancematrix, determine the optimal solutions, and compute the exact cost toensure that, of the likely paths found using the DNA computer, the bestanswer is chosen. (See, for example, FIG. 8)

Another embodiment uses the same primers to perform PCR, and determinesthe order of the input polynucleotides by the different sizes of the PCRproducts. Since the input polynucleotides are of a specific length, theanswer sequences will occur in lengths that are proportional to theinput length and the number of input polynucleotides between the twoprimers.

Determining a concentration of the individual ligated products can beaccomplished by any means known to those of skill in the art, includingbut not limited to spectrophotometry, gel electrophoresis, surfaceplasmon resonance, fluorescence detection, radiation detection, andinducement of movement of the polynucleotides (see, for example, WO2005/080603). In one embodiment, several cycles of amplification areperformed, such as by ligase chain reaction (“LCR”) to increase thenumber of copies of the ligated product in order to increase the abilityto detect the product answers. Other techniques, such as PCR, can alsobe used to amplify the number of copies of the ligated products prior todetermining the concentration of individual ligation products.

In one embodiment, determining the concentration of the ligationproducts includes determining the length of ligation products and theconcentration of each ligation product based on length. In anotherembodiment, ligation products are purified prior to determination oftheir concentration. Any molecular separation technique can be used,including but not limited to any chromatography technique. In oneembodiment, affinity columns are used with a biotinylated DNA probe thatis the compliment of each input polynucleotide. The biotinylated probeis then attached to an avidin chromatography column to make the affinitycolumn for that input polynucleotide. Thus, sequential affinitychromatography using separate affinity columns for the different inputpolynucleotides permits purification of individual ligation productsthat contain each input polynucleotides for which separate affinitychromatography was carried out, and other ligation products can bediscarded. Thus, in one embodiment, ligation products that contain eachof the input polynucleotides can be selected for. In combination withsize selection, individual ligation products with, for example, a singlecopy of each input polynucleotide can be selected, and others discarded.Those of skill in the art will understand be able to implement variousalternative embodiments based on the teachings herein.

In a further embodiment, avidin coated magnetic beads are used with abiotinylated DNA probe that is the compliment of each inputpolynucleotide. The biotinylated probe is then attached to an avidinbead to make the purification solution for that input polynucleotide.Thus, sequential purification solutions using separate tagged beads forthe different input polynucleotides permits purification of ligationproducts that contain each input polynucleotide for which separatemagnetic bead purification was carried out. Thus, in one embodiment,ligation products that contain each of the input polynucleotides can beselected for. In combination with size selection, ligation productswith, for example, a single copy of each input polynucleotide can beselected. Those of skill in the art will understand be able to implementvarious alternative embodiments based on the teachings herein.

As discussed above, the methods of the invention can be used to solveany type of NP optimization problem, including but not limited to thefollowing (see www.en.wikipedia.org/wiki/NP-complete, incorporated byreference herein in its entirety):

(A) Graph Theory Problems, Including but not Limited to

Covering and partitioning problems (including but not limited to thefollowing problem types: vertex cover; dominating set; domatic number;graph k-colorability; achromatic number; monochromatic triangle;feedback vertex set; feedback arc set; partial feedback edge set;minimum maximal matching; partition into triangles; partition intoisomorphic subgraphs; partition into Hamiltonian subgraphs; partitioninto forests; partition into cliques; partition into perfect matchings;two-stage maximum weight stochastic matching; covering by cliques; andcovering by complete bipartite subgraphs);

Subgraphs and supergraphs problems (including but not limited to thefollowing problem types: clique; independent set; induced subgraph withproperty Pi; induced connected subgraph with property Pi; induced path;balanced complete bipartite subgraph; bipartite subgraph; degree-boundedconnected subgraph; planar subgraph; edge-subgraph; transitive subgraph;uniconnected subgraph; minimum k-connected subgraph; cubic subgraph;minimum equivalent digraph; Hamiltonian completion; interval graphcompletion; and path graph completion);

Vertex ordering problems (including but not limited to the followingproblem types: Hamiltonian circuit; directed Hamiltonian circuit;Hamiltonian path; bandwidth; directed bandwidth; optimal lineararrangement; directed optimal linear arrangement; minimum cut lineararrangement; rooted tree arrangement; directed elimination ordering; andelimination degree sequence);

Iso- and other morphisms problems (including but not limited to thefollowing problem types: subgraph isomorphism; largest common subgraph;maximum subgraph matching; graph contractability; graph homomorphism;digraph D-morphism); path with forbidden pairs; multiple choicematching; graph Grundy numbering; kernel; K-closure; intersection graphbasis; path distinguishers; metric dimension; Nesetril-Rödl dimension;threshold number; oriented diameter; and weighted diameter).

(B) Network Design Problems, Including but not Limited to

Spanning tree problems (including but not limited to the followingproblem types: degree constrained spanning tree; maximum leaf spanningtree; shortest total path length spanning tree; bounded diameterspanning tree; capacitated spanning tree; geometric capacitated spanningtree; optimum communication spanning tree; isomorphic spanning tree; Kthbest spanning tree; bounded component spanning forest; multiple choicebranching; Steiner tree; geometric Steiner tree; and cable trench);

Cuts and connectivity problems (including but not limited to thefollowing problem types: graph partitioning; acyclic partition; maxweight cut; minimum cut into bounded sets; biconnectivity augmentation;strong connectivity augmentation; network reliability; networksurvivability; multiway cut; and minimum k-cut);

Routing problems (including but not limited to the following problemtypes: bottleneck traveling salesman; Chinese postman for mixed graphs;Euclidean traveling salesman; K most vital arcs; Kth shortest path;metric traveling salesman; longest circuit; longest path; prizecollecting traveling salesman; rural postman; shortest path in generalnetworks; shortest weight-constrained path; stacker-crane; timeconstrained traveling salesman feasibility; traveling salesman; andvehicle routing);

Flow problems (including but not limited to the following problem types:minimum edge-cost flow; integral flow with multipliers; path constrainednetwork flow; integral flow with homologous arcs; integral flow withbundles; undirected flow with lower bounds; directed two-commodityintegral flow; undirected two-commodity integral flow; disjointconnecting paths; maximum length-bounded disjoint paths; maximumfixed-length disjoint paths); quadratic assignment problem; minimizingdummy activities in PERT networks; constrained triangulation;intersection graph for segments on a grid; edge embedding on a grid;geometric connected dominating set; minimum broadcast time; min-maxmulticenter; min-sum multicenter; uncapacitated facility location; andmetric k-center);

(C) Sets and Partitions Problems, Including but not Limited to

Covering, hitting, and splitting problems (including but not limited tothe following problem types: 3-dimensional matching; exact cover; setpacking; set splitting; minimum cover; minimum test set; set basis;hitting set; intersection pattern; comparative containment; and3-matroid intersection);

Weighted set problems (including but not limited to the followingproblem types: partition; subset sum; subset product; 3-partition;numerical 3-dimensional matching; numerical matching with target sums;expected component sum; minimum sum of squares; Kth largest subset; andKth largest m-tuple);

(D) Storage and Retrieval Problems, Including but not Limited to

Data storage problems (including but not limited to the followingproblem types: bin packing; dynamic storage allocation; pruned triespace minimization; expected retrieval cost; rooted tree storageassignment; multiple copy file allocation; and capacity assignment);

Compression and representation problems (including but not limited tothe following problem types: shortest common supersequence; shortestcommon superstring; longest common subsequence; bounded postcorrespondence problem; hitting string; sparse matrix compression;consecutive ones submatrix; consecutive ones matrix partition;consecutive ones matrix augmentation; consecutive block minimization;consecutive sets; 2-dimensional consecutive sets; string-to-stringcorrection; grouping by swapping; external macro data compression;internal macro data compression; regular expression substitution;rectilinear picture compression; optimal vector quantization codebook;and minimal grammar-based compression);

Database problems (including but not limited to the following problemtypes: minimum cardinality key; additional key; prime attribute name;Boyce-Codd normal form violation; conjunctive query foldability;conjunctive boolean query; tableau equivalence; serializability ofdatabase histories; safety of database transaction systems; consistencyof database frequency tables; and safety of file protection systems);

(E) Sequencing and Scheduling Problems, Including but not Limited to

Sequencing on one processor problems (including but not limited to thefollowing problem types: sequencing with release times and deadlines;sequencing to minimize tardy tasks; sequencing to minimize tardy weight;sequencing to minimize weighted completion time; sequencing to minimizeweighted tardiness; sequencing with deadlines and set-up times; andsequencing to minimize maximum cumulative cost);

Multiprocessor scheduling problems (including but not limited to thefollowing problem types: multiprocessor scheduling; precedenceconstrained scheduling; resource constrained scheduling; scheduling withindividual deadlines; preemptive scheduling; and scheduling to minimizeweighted completion time);

Shop scheduling problems (including but not limited to the followingproblem types: open-shop scheduling; flow-shop scheduling; no-waitflow-shop scheduling; two-processor flow-shop with bounded buffer; andjob-shop scheduling); timetable design problems; staff schedulingproblems; production planning problems; and deadlock avoidance problems;

(F) Mathematical Programming Problems (Including but not Limited to theFollowing Problem Types:

integer programming; 0-1 Integer programming; quadratic programming;cost-parametric linear programming; feasible basis extension; minimumweight solution to linear equations; open hemisphere; K-relevancy;traveling salesman polytope non-adjacency; knapsack; integer knapsack;continuous multiple choice knapsack; partially ordered knapsack; andcomparative vector inequalities);

(G) Algebra and Number Theory Problems, Including but not Limited to

Divisibility problems (including but not limited to the followingproblem types: quadratic congruences; simultaneous incongruences;simultaneous divisibility of linear polynomials; comparativedivisibility; exponential expression divisibility; non-divisibility of aproduct polynomial; and non-trivial greatest common divisor);

Solvability of equations (including but not limited to the followingproblem types: quadratic diophantine equations; algebraic equations overGF[2]; root of modulus 1; number of roots for a product polynomial;periodic solution recurrence relation); permanent evaluation; cosineproduct integration; equilibrium point; unification with commutativeoperators; unification for finitely presented algebras; and integerexpression membership);

Games and puzzles, including but not limited to generalized hex;generalized geography; generalized Kayles; sequential truth assignment;variable partition truth assignment; sift; alternating hitting set;alternating maximum weighted matching; annihilation; left-rightHackenbush for redwood furniture; square-tiling; crossword puzzleconstruction; generalized instant insanity; Minesweeper consistencyproblem; Sudoku™; Nurikabe™; paint by numbers; light up; Slither™ link;Clickomania™; Tetris™; and Mastermind™;

(J) Program Optimization, Including but not Limited to

Code generation problems (including but not limited to the followingproblem types: register sufficiency; feasible register assignment;register sufficiency for loops; code generation on a one-registermachine; code generation with unlimited registers; code generation forparallel assignments; code generation with address expressions; codegeneration with unfixed variable locations; ensemble computation; andmicrocode bit optimization);

Program and scheme problems (including but not limited to the followingproblem types: inequivalence of programs with arrays; inequivalence ofprograms with assignments; inequivalence of finite memory programs;inequivalence of loop programs without nesting; inequivalence of simplefunctions; strong inequivalence of Ianov schemes; strong inequivalencefor monadic recursion; non-containment for free B-schemes; non-freedomfor loop-free program schemes; and programs with formally recursiveprocedures); cyclic ordering problems; non-liveness of free choice Petrinet problems; reachability for 1-conservative Petri net problems; finitefunction generation problems; permutation generation problems; decodingof linear code problems; Shapley-Shubik voting power problems;clustering problems; randomization test for matched pair problems;maximum likelihood ranking problems; matrix domination problems; matrixcover problems; simply deviated disjunction problems; decision treeproblems; minimum weight and/or graph solution problems; fault detectionin logic circuit problems; fault detection in directed graph problems;and fault detection with test point problems.

Further details of some of these NP optimization problems are presentedbelow (taken from http://en.wikipedia.org/wiki/NP-complete; details ofmany of the other problems can also be found at this site):

Graph Theory

In mathematics and computer science, graph theory studies the propertiesof graphs. Informally, a graph is a set of objects called vertices (ornodes) connected by links called edges (or arcs) which can be directed(assigned a direction). Typically, a graph is designed as a set of dots(the vertices) connected by lines (the edges). Structures that can berepresented as graphs are ubiquitous, and many problems of practicalinterest can be represented by graphs. The link structure of a websitecould be represented by a directed graph: the vertices are the web pagesavailable at the website and there's a directed edge from page A to pageB if and only if A contains a link to B. The development of algorithmsto handle graphs is therefore of major interest in computer science.

A graph structure can be extended by assigning a weight to each edge.Graphs with weights can be used to represent many different concepts;for example if the graph represents a road network, the weights couldrepresent the length of each road. Another way to extend basic graphs isby making the edges to the graph directional (A links to B, but B doesnot necessarily link to A, as in webpages), technically called adirected graph or digraph. A digraph with weighted edges is called anetwork.

Networks have many uses in the practical side of graph theory, networkanalysis (for example, to model and analyze traffic networks or todiscover the shape of the internet).

Network analysis is the analysis of networks through network theory (ormore generally graph theory). The networks may be social, transportationor virtual, such as the internet Analysis include descriptions ofstructure, such as small-world networks or scale-free networks,optimisation, such as Critical Path Analysis and PERT (ProgramEvaluation & Review Technique), and properties such as flow assignment.

Social network analysis maps relationships between individuals in socialnetworks. Network analysis, and its close cousin traffic analysis, hassignificant use in intelligence. By monitoring the communicationpatterns between the network nodes, its structure can be established.This can be used for uncovering insurgent networks of both hierarchicaland leaderless nature.

Link analysis is a subset of network analysis, exploring associationsbetween objects. An example may be examining the addresses of suspectsand victims, the telephone numbers they dialed and financialtransactions they partaked in a given timeframe, and the familialrelationships between these subjects as a part of police investigation.Link analysis here provides the crucial relationships and associationsbetween very many objects of different types that are not apparent fromisolated pieces of information. Computer-assisted or fully automaticcomputer-based link analysis is increasingly employed by banks andinsurance agencies in fraud detection, by telecommunication operators intelecommunication network analysis, by medical sector in epidemiologyand pharmacology, in law enforcement investigations, by search enginesfor relevance rating (and conversely by the spammers for spamdexing andby business owners for search engine optimization), and everywhere elsewhere relationships between many objects have to be analyzed.

Centrality Measures

Information about the relative importance of nodes and edges in a graphcan be obtained through centrality measures. For example, eigenvectorcentrality uses the eigenvectors of the adjacency matrix to determinenodes that tend to be frequently visited. An example is the page rankalgorithm used by Google. The principal eigenvector of the modifiedadjacency matrix of the www-graph gives the page ranks as itscomponents.

Fifteen puzzle: The n-puzzle is known in various versions, including the8 puzzle, the puzzle, and with various names. It is a sliding puzzlethat consists of a grid of numbered squares with one square missing, andthe labels on the squares jumbled up. If the grid is 3×3, the puzzle iscalled the 8-puzzle or 9-puzzle. If the grid is 4×4, the puzzle iscalled the 15-puzzle or 16-puzzle. The goal of the puzzle is toun-jumble the squares by only making moves which slide squares into theempty space, in turn revealing another empty space in the position ofthe moved piece.

The n-puzzle is a classical problem for modelling algorithms involvingheuristics. Commonly used heuristics for this problem include countingthe number of misplaced tiles and finding the sum of the Manhattandistances between each block and its position in the goal configuration.Note that both are admissible, i.e., they never overestimate the numberof moves left, which ensures optimality for certain search algorithmssuch as A*.

It is possible to use parity arguments to show that some startingpositions for the n-puzzle are impossible to resolve, no matter how manymoves are made. This is done by considering a function of the tileconfiguration that is invariant under any valid move, and then usingthis to partition the space of all possible labelled states intoequivalence classes of reachable and unreachable states.

Knapsack problem: The knapsack problem is a problem in combinatorialoptimization. It derives its name from the maximization problem ofchoosing as much as possible essentials that can fit into one bag (ofmaximum weight) you are going to carry on a trip. A similar problem veryoften appears in business, combinatorics, complexity theory,cryptography and applied mathematics. Given a set of items, each with acost and a value, then determine the number of each item to include in acollection so that the total cost is less than some given cost and thetotal value is as large as possible.

The decision problem form of the knapsack problem is the question “can avalue of at least Vbe achieved without exceeding the cost C?”

Hamiltonian cycle problem: In the mathematical field of graph theory theHamiltonian path problem and the Hamiltonian cycle problem are problemsof determining whether a Hamiltonian path or a Hamiltonian cycle existsin a given graph (whether directed or undirected). Both problems areNP-complete. The problem of finding a Hamiltonian cycle or path is inFNP.

There is a simple relation between the two problems. The Hamiltonianpath problem for graph G is equivalent to the Hamiltonian cycle problemin a graph H obtained from G by adding a new vertex and connecting it toall vertices of G.

The Hamiltonian cycle problem is a special case of the travelingsalesman problem, obtained by setting the distance between two cities tounity if they are adjacent and infinity otherwise.

Traveling salesman problem: Given a number of cities and the costs oftraveling from any city to any other city, what is the cheapestround-trip route that visits each city once and then returns to thestarting city?

An equivalent formulation in terms of graph theory is: Given a completeweighted graph (where the vertices would represent the cities, the edgeswould represent the roads, and the weights would be the cost or distanceof that road), find the Hamiltonian cycle with the least weight.

It can be shown that the requirement of returning to the starting citydoes not change the computational complexity of the problem.

A related problem is the bottleneck traveling salesman problem(bottleneck TSP): Find the Hamiltonian cycle in a weighted graph withthe minimal length of the longest edge.

The problem is of considerable practical importance, apart from evidenttransportation and logistics areas. A classic example is in printedcircuit manufacturing: scheduling of a route of the drill machine todrill holes in a PCB. In robotic machining or drilling applications, the“cities” are parts to machine or holes (of different sizes) to drill,and the “cost of travel” includes time for retooling the robot (singlemachine job sequencing problem).

Clique problem: A clique (graph theory) in a graph is a set of pairwiseadjacent vertices, or in other words, an induced subgraph which is acomplete graph. In the graph at the right, vertices 1, 2 and 5 form aclique, because each has an edge to all the others. Then, the cliqueproblem is the problem of determining whether a graph contains a cliqueof at least a given size k. Once we have located k or more verticeswhich form a clique, it's trivial to verify that they do, which is whythe clique problem is in NP. The corresponding optimization problem, themaximum clique problem, is to find the largest clique in a graphVertex cover problem: A vertex cover of an undirected graph G=(V,E) is asubset V of the vertices of the graph which contains at least one of thetwo endpoints of each edge:

V^(′) ⊆ V : ∀{a, b} ∈ E : a ∈ V^(′)b ∈ V^(′).

The vertex cover problem is the optimization problem of finding a vertexcover of minimum size in a graph. The problem can also be stated as adecision problem:

INSTANCE: A graph G and a positive integer k.

QUESTION: Is there a vertex cover of size k or less for G?

Independent set problem: Given a graph G, an independent set is a subsetof its vertices that are pairwise not adjacent. In other words, thesubgraph induced by these vertices has no edges, only isolated vertices.Then, the independent set problem asks if, given a graph G and aninteger k, does G have an independent set of size at least k? Thecorresponding optimization problem is the maximum independent setproblem, which attempts to find the largest independent set in a graph.Graph coloring problem: In graph theory, graph coloring is an assignmentof “colors”, almost always taken to be consecutive integers startingfrom 1 without loss of generality, to certain objects in a graph. Suchobjects can be vertices, edges, faces, or a mixture of the above. Amongall, vertex coloring is the most important kind, not only because it isthe starting point of the entire subject, but also because othercoloring problems can be transformed into a vertex version. For example,an edge coloring of a graph is just the vertex coloring of its linegraph. Likewise, a face coloring of a planar graph is just the vertexcoloring of its (planar) dual. However, to keep things in theirperspective, non-vertex coloring problems are usually stated and studiedas are. Graph coloring enjoys many practical applications as well astheoretical challenges. Beside the classical types of problems,different limitations can also be set on the graph, or on the way acolor is assigned, or even on the color itself. It has even reachedpopularity with the general public in the form of the popular numberpuzzle Sudoku. Graph coloring is still a very active field of research.

In another aspect, the present invention provides computer readablestorage media comprising a set of instructions for causing a processingdevice to execute procedures for carrying out the methods of theinvention disclosed above. The computer readable storage medium caninclude, but is not limited to, magnetic disks, optical disks, organicmemory, and any other volatile (e.g., Random Access Memory (“RAM”)) ornon-volatile (e.g., Read-Only Memory (“ROM”)) mass storage systemreadable by a central processing unit (“CPU”). The computer readablestorage medium includes cooperating or interconnected computer readablemedium, which can exist exclusively on the processing system of theprocessing device or be distributed among multiple interconnectedprocessing systems that may be local or remote to the processing device.

In a preferred embodiment, the computer readable storage mediumcomprises a set of instructions for causing a processing device (such asa computer) to execute the methods of the invention. In one embodiment,the process is completely automated, wherein a user selects input andconnection oligonucleotide sequences; the computer readable storagemedium thus provides instructions to a processing device to cause anucleic acid synthesizer to effect synthesis of the input and connectionoligonucleotides. The user provides parameters for combining of theinput and connection polynucleotides; the computer readable storagemedium thus provides instructions to a processing device to cause, forexample, an automated microplate system comprising a robotic arm toautomatically carry out the desired combinations under the desiredconditions, including ligation. The computer readable storage mediumfurther provides instructions to a processing device to cause a deviceto identify individual oligonucleotide products and determine theirconcentration, wherein the ligation products present at the highestconcentration represent optimal solutions to the nondeterministicpolynomial optimization problem. Embodiments of the instructions includeall of those discussed above for the methods of the invention.

In another embodiment, only certain portions of the method areautomated.

FIG. 7 provides an exemplary flow chart for one such embodiment. Forexample, the computer readable storage medium provides instructions forcausing a processing device to determine a reduced distance matrix froma set of ordered pairs, as disclosed above. The computer readablestorage medium may further provide instructions for causing a processingdevice to analyze the reduced distance matrix, determine the optimalsolutions, and compute the exact cost to ensure that, of the likelypaths found using the DNA computer, the best answer is chosen. It willbe understood by those of skill in the art that all of these steps afterordered pair analysis are optional and are not necessary wherein orderedpair analysis can be performed by visual inspection, as in Example 3below.

The present invention may be better understood with reference to theaccompanying examples, which are intended for purposes of illustrationonly and should not be construed to limit the scope of the invention.

EXAMPLE 1 General Methodology to Solve the Traveling Salesman Problem

Step 1: Create a Set of Unique DNA Sequences for Each City (“InputPolynucleotides” or “City Sequences”) and for Each Path that ConnectsCities (“Connection Polynucleotides” or “Pathway Sequences”)

In this example, 20 base pair nucleotides that correspond to each vertex(city) in the graph (city sequences) are prepared (FIGS. 1A and 1B).FIG. 1 corresponds to a 4 city problem for the purpose of illustrationof the concept. Additionally, 20-mer sequences are created to representthe edges (pathway sequences) in the graph (FIG. 1C). If two citysequences, A and B, are composed of two 10-mer sequences x₁y₁ and x₂y₂(ie: each of the “x” and “y” portions are 10 nucleotides in length),then an edge or path, P_(AB), from A to B is a 20-mer composed of theWatson-Crick complement of y₁ and x₂ (y₁x₂). Thus, when put in solution,the path sequence binds to the second half of the source city and thefirst half of the destination city, serving as a connector for bindingthe cities together (FIGS. 1D and 1E).

Step 2: Hybridize City and Pathway DNA Sequences Based on a DistanceMatrix Between Cities.

City sequences are added in saturating amounts to the hybridization mixwhile the limiting amounts of pathway sequences are determined by thedistance matrix. This generates DNA strands that incorporate some or allof the cities. Optimal paths through the graph are generated bycombining many copies of the city sequences and various concentrationsof pathway sequences into a solution and letting them hybridize (FIG.1F), and ligating the products. The most probable order that the citiesshould be visited for greatest efficiency are made in the highestconcentrations, and many poor solution sequences may not be formed sincethey occur with such low probability. Since only some of the bestsolutions are generated, it is not necessary to start with enough DNA togenerate all possible answers; this eliminates the staggering amount ofDNA needed to compute all possible solutions to problems with more than20 cities. This allows us to generate an optimal subset of all possiblesolutions, avoiding scalability problems identified in the art.

Step 3: Removal of Answer Sequences (“Ligation Products”) Lacking a CitySequence.

An answer sequence is clearly incorrect if one or more of the citysequences is longer or shorter than the length defined by the number ofcities. For the 10 city problem, this results in 220 base pair DNAoligomers, as each answer starts and end with City A and each city is a20mer.

1. The DNA strands are separated by their lengths using gelelectrophoresis and all 220-mers are removed from the gel (FIG. 3).Answer sequences that are too short or too long are either missingcities or have multiple copies of the same city; they are thus removed.

2. Answer sequences are sequentially probed with the DNA complement to acity sequence tethered to magnetic beads; in this case, 10 sequentialprobes for each of the 10 cities (FIG. 4). This removes answer sequencesthat lack a city sequence.

Step 4: Determine Abundance of Adjacent City Sequences to Read theOutput.

Biotinylated DNA probes that serve as the complement to each citysequence are added to the answer sequences to build DNA bridges viahybridization and ligation using the molecular motor methodologydeveloped in this laboratory (See, for example, WO 2005/080603;incorporated by reference herein in its entirety). The 3′ biotinylatedprobe for city A is added to the answer solution first with the 5′biotinylated probe for City B, followed by ligation to build DNA bridges(FIG. 5). Quantitation of these bridges provides information concerninghow often City A precedes City B in the answer set. This is repeated forCity A with all of the other cities, and ultimately for all possiblecity pairings.

EXAMPLE 2 Application of the General Method

A set of 10 unique 20-mer city sequences (input polynucleotides) wasdesigned to represent each city and synthesized by InvitrogenCorporation. An additional 90 pathway sequences (connectionpolynucleotides) were synthesized containing all possible combinationsof the complementary sequences to join any two city sequences together.The polynucleotide sequences were designed to minimize crosshybridization, self-assembly and secondary structure formation andretain very similar thermal properties (melting temperature, in range of61.3 to 61.8° C.) and GC content (25-30%). However, we used start andend city sequences with an additional “GC cap” (string of G or Cresidues) that raised the PCR™ of these sequences to 68-72° C. tofurther minimize secondary structure formation and inappropriateinsertion of the staring and/or ending city sequence into the middle ofthe solution.

The yields of the synthesis for the DNA oligos were used to determinethe distance matrix that defined the problem to be solved (ie: theweight assigned to each connection polynucleotide and pathway sequenceis based on its synthesis yield, to ensure that we solved a randomlygenerated problem. Traditionally, the traveling salesman problem isconstrained to distance matrices that fall in 2 or 3 dimensions.However, this reduces the complexity of the problem and does not testthe full capability of the DNA computer. Thus, the distance matrix wechose to solve was not constrained, and solving it suffices to show thatthe technique can be used to solve any problem of lesser complexity. Thedistance matrix used is shown in Table 1.

TABLE 1 Distance Matrix A B C D E F G H I J A ** 55.2 34.05 31.75 53.8539.95 36 39.9 36.55 52.6 B 63.95 ** 54.25 54.95 72.6 45.05 71.65 50.5552.75 52.15 C 51.35 47.6 ** 41.45 39.8 57.8 55.2 32.75 34.85 37.05 D46.65 46.25 54.6 ** 49.4 45.55 55.9 52 57.35 54.6 E 49.9 39.1 42.65 52.4** 25.9 39.85 38.85 37.95 33.1 F 59.7 49.15 47.8 56.9 58.05 ** 48.0546.6 48.4 47.7 G 51.25 36.7 43.95 43 42.45 40.25 ** 64.2 47.8 46.95 H58.1 35.85 53.7 45.05 47.3 43 84.25 ** 42.8 41.9 I 52.9 38.2 40.35 33.4536.5 65.2 35 29.7 ** 30.95 J 60.05 39.1 40.65 55.75 41 41.1 45.1 58.6543.95 **

for example, in row “a”, under column “b” the number is 55.2; in row “b”under column “a” the number is 63.95; the former represents a-bconnection polynucleotide while the latter represents b-a connectionpolynucleotide.

b. Hybridize City and Pathway DNA Sequences Based on a Distance MatrixBetween Cities.

We set an approximate 10:1 (city:linker) concentration ratio forhybridization and ligation reactions. This ensures city sequenceconcentrations are saturated concentrations compared to the pathwaysequences, while pathway sequence concentrations are limiting and arevaried in a manner that any potential linking between any pair of citiesis totally dependent upon concentration of the corresponding linker.Table 1 shows concentration differences for all linkers that were usedfor solving 10-city problem.

The differences in linker concentrations from the vendor were used todefine the problem, with the lowest concentration being 25.3 ng/ul andthe highest one being 84.5 ng/ul. As will be understood by those ofskill in the art, any other parameter applicable to a given problem canbe used for determining the weight assigned to the connectionpolynucleotides (pathway sequences).

The initial solution pool was generated through hybridization andligation adapted from Adleman's (1994) protocol. In addition, wedeveloped and performed a two-step protocol to generate the initialsolution pool. First, initial hybridization/ligation was conducted inmixture in the absence of the ending city sequence and all potentiallinkers to the ending city. This greatly reduced formation of shortersolutions, thus improving the hybridization/ligation efficiency. Second,the hybridization/ligation was allowed to continue with the addition offresh ligase, the ending city sequence and corresponding linkers.

c. Removal of Answer Sequences Lacking a City Sequence.

1. Isolation of Solutions

Following ligation, PCR amplifications of ligation products at ahybridization Tm of 68-70° C., purification of the ligation products wascarried out by denaturing polyacrylamide gel electrophoresis, with thefollowing modifications to avoid smearing due to secondary structureformation:

(1) Including of 8 M urea and 30% formamide as denaturants in the gel;

(2) Adjusting the speed of gel polymerization by chemical andphotochemical catalysis. We used 20 ml liquid acrylamide and then added100 μl 10% pyrophosphate, and polymerized the gel under a light; and

(3) Immersing the PAGE device in a 60-65° C. water bath duringelectrophoresis.

By utilizing this protocol, 230mer DNA bands were clearly resolved, andfurther PCR purification using the 230-mer DNA template extracted fromthe excised PAGE gel generated sharp DNA bands with the desired, 230-merlength. This confirms that we collected the 230-mer, 10-city solutions(ligation products or solutions) with correct starting and endingcities.

2. Probing Solutions with the DNA Compliment to a City Sequence Tetheredto Magnetic Beads.

Magnetic-bead affinity purifications of the 230-mer solutions usingbiotinylated oligo probes containing complementary sequence for eachcity was carried out to ensure every city was visited once and onlyonce. This procedure removed answer sequences that were missing any onecity. To implement magnetic affinity purifications, a ssDNA (singlestranded) solution pool was generated by PCR amplifying the 230-mersolutions using 3′-biotinylated primer. Biotinylated DNA ofcomplementary sequences to each city was attached tostreptavidin-functionalized magnetic beads. The amplified solution poolwas then incubated with magnetic beads bound to the complimentarysequence of City A and DNA strands containing City A sequences wereremoved with a magnet and the remaining DNA was washed away.

Answer sequences were then recovered from the beads by a chemicaldenaturing protocol devised to obtain the screened ssDNA solutionwithout the biotinylated city probe. For chemical denaturing, 50 ul of0.1N NaOH were used followed by an addition of 7 ul of 1M HCl forneutralization.

Using this protocol, we found that after all 10 cities had been screenedby magnetic-bead affinity purification, a sharp 230-mer DNA band of theanswer solutions remained.

d. Determine Abundance of Adjacent City Sequences to Read the Output.

The techniques developed to produce the solution set were efficientenough to allow us to use a combination of PAGE and ligation chainreaction (LCR) techniques (See, for example, WO 2005/080603) to read theanswer of the problem computed. The 3′biotinylated probe for city A wasadded to the answer solution first with the 5′biotinylated probe forCity B, followed by ligation to build DNA bridges (FIG. 6). Quantitationof these bridges provides information concerning how often City Aprecedes City B in the answer set. This is repeated for City A with allof the other cities, and ultimately for all possible city pairings. LCRproducts were profiled on a PAGE gel and the relative abundance of eachpotential pairing of cities was measured as total density using the UVPGDS-8000 BioImaging system.

The concentration of each probe was measured in triplicate using theNanoDrop method, and a saturating concentration for each probe wasestablished. This ensures the yield of LCR product for a given linkbetween two cities will be attributed to abundance in the answersolution. FIG. 6 inset shows a typical LCR profile of PAGE gel whichillustrates probabilistic links between City D and the rest other 9cities.

Quantitative determination of the yield of LCR product in each lane ofthe gel was accomplished by: (1) measuring the total density of theupper DNA band (the LCR product) and the lower DNA band (the PCRprobes); (2) measuring total density of the 100-mer band (the brightestone) from DNA ladder; (3) normalizing the total density of LCR productand probes against the 100-mer DNA ladder; and (4) determining the ratioof the normalized total density of PCR product over the normalized totaldensity of PCR probe. These ratios were then stored in a matrix calledR, the elements of which represent a global measure of the amount ofcity pairings that exist in the solution set.

The matrix formed from the city pairings as the result of the gelanalysis is shown in Table 2.

TABLE 2 Matrix formed through the LCR gel read out.

***indicate paths that are nor contained in the subset generated

This table redefines the TSP, from one that had 3.3 million possiblesolutions, to one with ˜250,000 solutions. A conventional computer witha standard brute force algorithm was used to find the optimal answer ofthe 250 k solutions generated.

The number of possible solutions in R is substantially less than in theoriginal distance matrix calculated by a conventional computer becausethe total number of DNA molecules used in the calculation wasinsufficient to generate all of the possible answers. The number ofpossible solutions is limited by those rows and columns that have thefewest possible transitions, where a “transition” means moving from onecity to another. Each row or column that has fewer than 9 transitionslimits the number of degrees of freedom (ie: how many other cities itcan directly connect to) that any path may travel. For example, H mayonly be traveled to from 1 or J, and thus it has a degree of freedom of6. The zeros in the above matrix mean that a particular road does notexist, thus it cannot be traveled; thus only when there is a nonzerovalue in the matrix, can you move from one city to another. To find themaximum number of potential solutions, we begin with the path that hasthe smallest degree of freedom and work up from there. In this case thenumber of possible solutions can be calculated by taking the minimum ofthe lowest degree of freedom and the number of remaining cities to moveto. In this way we determined that the matrix generated by the DNAcomputer has at most (7*7*7!)≈246,960 answer sets, or about 6.8% of thepossible solutions of the original problem. Thus, our DNA computerreduced a problem with 3.3 million possible solutions to one with about246,960.

The design of our DNA computer is based on the hypothesis that when theamount of DNA city and pathway sequences used for the computation isinsufficient to generate all possible answers, the subset of possibleanswers generated will be the most probable, and hence, include theoptimal solution. In so doing, the DNA computation would serve to reducethe problem to one that has a searchable number of solutions by aconventional computer. We were able to test this hypothesis using thecomputational results in R because the number of solutions to the 10city problem was small enough to be searched by a conventional computer.

Using a conventional computer to perform a brute search of all possiblesolutions, the optimal solution was found to be AFEJBCDIHGA. The answerset determined by the DNA computer included the correct solution.Conventional computing was then used to determine the best 1000 answersolutions, rank them in order of best to worst. The answer solutionsfrom the DNA computation were then examined to identify which citypairings were not contained in the subset of solutions generated by theDNA computer. The DNA computer successfully generated the 24 mostoptimal answers. The first answer not included in the answer set was the25^(th) most optimal. The number of answers excluded by the DNA computerincreased proportionately to the decrease in optimality. These resultsdemonstrate that our DNA computer has successfully computed the 10 citytraveling salesman problem.

EXAMPLE 3 Solving a 15 City Traveling Salesman Problem

In this example, manageable amounts of DNA are used to solve a 15-cityasymmetric traveling salesman problem (TSP), the largest problem solvedby molecular computing to date. As discussed above, the TSP is to findthe optimal path between all desired cities, starting and ending at thesame city and visiting each city once and only once. The methods hereinexploit the truly random molecular process of Brownian motion inherentin molecular interactions to generate an optimal subset of answers. Asimilar process is not possible using in silico computers since trulyrandom samples cannot be generated by deterministic circuits.

Concentration control was used to implement a probabilistic computationto identify the optimal path for a fully random, connected, asymmetric15-city TSP. When concentration of the pathway DNA is used to encodeoptimality, the answer population is symmetrically distributed where themode and mean correspond to the optimal answer. Generating a subset ofanswers is equivalent to taking a random sampling of the populationwhere the mean of the population is related to the mean of the sampleusing the classic relation: μ=Sample Mean±error. The error can beestimated from the standard deviation of the sample divided by thesquare root of the sample size. In this case the sample size is thenumber of molecules that form correct answers to the problem (i.e.answers with one and only one copy of each city). Amplification of thecorrect solutions allows the sample size to be controlled. Thus, theerror can be controlled to generate a sample mean to within any desiredconfidence interval of the mean of the population. Consequently, thenumber of possible answers to the problem can exceed the molecules inthe reaction, thereby allowing computations of large problems withmanageable amounts of DNA. As will be understood by those of skill inthe art based on the teachings herein, each problem will have adifferent range over which inspection of the gel will be sufficient toget see the answer, and we have not determined what the bounds are forthe problems we solved)

Table 3 shows the distance matrix that defines a specific problem. Eachletter represents a different city in the TSP and city A serves as thestarting and ending point. The experiments to implement the algorithmwere divided into four steps:

Step 1: A set of unique DNA sequences was created for each city andbinary pathway.

Step 2: City and pathway DNA sequences were hybridized based on thedistance matrix.

Step 3: Solutions that did not satisfy the TSP requirement were removed.

Step 4: The abundance of adjacent city sequences was determined to readout the optimal solution set.

To implement Step 1 of the algorithm, each city was associated with asynthetic 20-mer sequence of DNA. They were chosen to minimize the crosshybridization, with melting points varied from 60.6 to 62° C. and GCcontent from 20% to 35%. For each pathway that connected two differentcities, a 20-mer DNA was created such that hybridization of the pathwaysequences occurs between the first half of one city and the second halfof another. FIG. 8 gives an example of city B, city C and pathway BCthat connects them. This construction allowed the set of city sequencesto be linked sequentially in all possible combinations to form longsolution strands upon addition of ligase. Answers were formed bycombining 100 pmols of each city sequence, 100 pmol of pathways fromA_(start)→B, B→C, C→D, . . . , N→O, O→A_(end), and 1 pmol of all otherpathways in distilled water. The solution was heated to 92° C. for 4min, cooled at 1° C./min to 4° C., then incubated at 8° C. for ˜16 hafter addition of T4 ligase (5 Weiss Units), 20 mM DTT, and 10 mM ATP.This treatment was repeated prior to a final addition of T4 ligase (5Weiss Units), 20 mM DTT, and 10 mM ATP followed by incubation for 16 hat 8° C.

Fourteen unique DNA 20mers were designed with similar meltingtemperatures and assigned to each city designated B-O. Two city Asequences were designed to serve as the starting and ending point, eacha 40mer with a higher GC content than cities B-O so they could serve asPCR primers. Intercity pathway 20mers hybridized with the first and lasthalves of cities to assemble city strands in a sequential manner whichcorrelates to a particular city order. Pathways were added in amountsthat varied relative to the efficiency of that path. Low and highefficiency paths were added at 1 and 100 pmol respectively, where highefficiency paths bridged cities in alphabetical order.

In addition, two sequences that represented starting city A_(s) andending city A_(e) separately were designed to replace the single city Asequence (FIG. 9). The second-half and the first-half of the originalcity A sequence were used in the new starting and ending sequencesrespectively. Each half of the original was extended by unique 10-mersequences with 100% GC content to raise the melting temperature andeliminate non specific annealing during PCR amplification. An additional5-mer GC cap was added to prevent two point cities from being insertedin the middle of the solutions. To further reduce formation ofincomplete solution sequences step 2 was performed in two stages. First,hybridization/ligation was initiated in the absence of the ending cityA_(e) and all pathways directed to A_(e) for a period sufficient toenable strand ligation. Second, the reaction was continued by addingfresh ligase, the ending city sequence, and corresponding pathways toproduce the complete final solutions that start and end with the samecity.

To implement Step 3, the hybridization/ligation products of Step 2 werepurified by PCR amplification using the 5′-starting and the 3′-endingcity primers. The PCR primers were designed to complement uniquesequences of A_(s) and A_(e). Consequently, only those moleculesencoding routes that begin and end with the assigned city were able tobe amplified. The PCR products were profiled on a denaturingpolyacrylamide gel, and the 330-mer band corresponding to DNA encodinground-trip routes entering exactly 16 cities (25-mer city A_(s) andA_(e), 20-mer other 14 cities) were excised (FIG. 10 (a)(A)). These PCRamplification and gel purification procedures were repeated severaltimes to enhance the purity of target DNA (FIG. 10 (a)(B)). Ligationproducts were separated by a 6% denaturing PAGE gel with 8.5 M urea and30% formamide, (acrylamide:bisacrylamide, 29:1) in 100 mM Tris-Cl, pH8.3, 83 mM boric acid, and 2 mM EDTA at ˜55° C. under 8 V/cm.Polymerization speed of the gel was controlled using chemical andphotochemical catalysis to increase resolution. Electrophoresis wasperformed at ˜65° C. The 340mer band was collected from the gel using aQiagen gel extraction kit. Answer sequences containing A_(start) andA_(end) were purified by magnetic affinity, followed by PCRamplification in a 50 μL reaction mixture containing 10 mM Tris-HCl, pH8.3, 50 mM KCl, 1.5 mM MgCl₂, 0.001% gelatin, 200 μM each dNTP, 0.4 μMof each of the A_(start) and A_(end) primers, a 1:100 ratio of Bst andTaq DNA polymerase, and 0.1-0.2 μg DNA template. The PCR reactionoccurred in a PJ2000 DNA thermal cycler programmed for a hot start at94° C. for 2.5 min followed by 35 cycles of 94° C. for 0.5 min, 70° C.for 35 sec, and 72° C. for 30 sec, concluded by 3 min at 72° C.

To ensure each 330-mer strand exactly contained all the cities, themultiple purified DNA sequences were probed by biotin-avidin magneticbeads system. This was accomplished by first incubating the productswith the complementary sequences to city B that were conjugated tomagnetic beads. Using a magnetic separator, only the strands thatcontained the city B sequence (and hence code routes that enter city Bat least once) were retained by hybridization to the immobilized probes.This process was repeated successively with every city and the finalproducts were amplified by PCR and run on a gel. The single bright bandin FIG. 10 (a)(C) represented the final separated DNA solutions thatstart and end at city A, and also include each of the other 14 citiesonce and only once. It is note worthy that this band contained not onlythe optimal TSP solution in the largest abundance but also other correctbut suboptimal Hamiltonian loops (FIG. 10( b)). The PCR products wereprofiled on a 6% denaturing PAGE gel as described above. The amplifiedsolution was filtered using sequential magnetic affinity purificationfor each of the cities B-O.

To implement Step 4, the final answers pool from Step 3 was detectedwith pairs of probes that were complementary to two selected cities inall possible combinations. When the two cities were adjacent in aspecific order, the complementary probes would become covalently linkedthrough the ligation chain reaction (LCR). LCR was performed with equalsaturated amounts of probe to ensure that the yield of LCR product for agiven link between two cities was limited by the abundance of thecorresponding answers. In the 15-city problem, 210 reactions were runfor each possible pairing, and the products were profiled on PAGE gels(FIG. 9). The lower bands in each lane corresponded to 20-mer DNA probesthat were not ligated during LCR. The upper bands corresponded to the40-mer composed of the ligated probes and thus, indicated the abundanceof that ordered city pair in the answers pool. The LCR reaction mixturecontained 10× buffer, 5 μL target DNA (10 ng), 40 units Taq ligase(NEB), and 400 nM of each of 4 probes in a 50 μL final volume. The 4probes included two city sequences and their complements. For eachreaction, one sequence of each pair was phosphorylated (city1:P*-city2and P*-compliment1:compliment2) to ensure that ligation could only occurif city1 immediately preceded city2. The LCR was initiated by heatingthe solution to 94° C., for 2.5 min, following by 25 cycles with eachcycle consisting of 94° C. for 25 sec, 41° C. for 35 sec, and 45° C. for150 sec. LCR products were profiled by on a 10% PAGE gel,(acrylamide:bisacrylamide, 29:1) in 100 mM Tris-Cl, pH 8.3, 83 mM boricacid, and 2 mM EDTA at ˜55° C. under 8 V/cm. All PAGE gels were stainedwith ethidium bromide (1 mg/ml) for 10 min, then visualized andphotographed with a UVP BioDoc-It™ UV transilluminator.

TABLE 3 Matrix for concentration ratio in the 15-city TSP A B C D E F GH I J K L M N O A *** 100 1 1 1 1 1 1 1 1 1 1 1 1 1 B 1 *** 100 1 1 1 11 1 1 1 1 1 1 1 C 1 1 *** 100 1 1 1 1 1 1 1 1 1 1 1 D 1 1 1 *** 100 1 11 1 1 1 1 1 1 1 F 1 1 1 1 *** 100 1 1 1 1 1 1 1 1 1 F 1 1 1 1 1 *** 1001 1 1 1 1 1 1 1 G 1 1 1 1 1 1 *** 100 1 1 1 1 1 1 1 H 1 1 1 1 1 1 1 ***100 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 *** 100 1 1 1 1 1 J 1 1 1 1 1 1 1 1 1*** 100 1 1 1 1 K 1 1 1 1 1 1 1 1 1 1 *** 100 1 1 1 L 1 1 1 1 1 1 1 1 11 1 *** 100 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 *** 100 1 N 1 1 1 1 1 1 1 1 11 1 1 1 *** 100 O 100 1 1 1 1 1 1 1 1 1 1 1 1 1 ***

A carefully controlled concentration matrix replaced the previous randomfully connected one, guaranteeing that one Hamiltonian path is clearlyoptimal compared to others. This path was selected to beABCDEFGHIJKLMNOA, by adding pathway sequences i.e. AB, BC, CD, etc. atconcentration 100 fold greater than pathway sequences that would lead tosuboptimal answers (Table 3). It is note worthy that due to thepurposely assigned optimal path and corresponding concentration control,this experiment provided a test of the feasibility of the method ratherthan a blind search for an unknown solution as was done in the 10-cityTSP.

Another important modification of the protocol for the 15-city problemcomputing was the enhancement of purity of oligonucleotides used in thehybridization/ligation reaction. The initial set of 20-mer sequencesrepresenting cities and pathways were ordered from Invitrogen Inc., withapproximately 70% purity. To accomplish the ligation reaction, thosesequences, except for the starting city A_(s) and all pathways directedfrom A_(s) to subsequent pathways had to be phosphorylated so thatphosphodiester bonds could be synthesized between adjacent sequencesupon ligation. Therefore, three factors were identified that affectefficiency of the answer generation step: 1) the purity of initialoligonucleotides; 2) the efficiency of phosphorylation reaction; 3) theefficiency of ligation reaction. If the numbers were 70%, 75%, 50%respectively, the probability of producing a whole strand would approachzero (0.7×0.75¹⁵×0.5¹⁴=5.7×10⁻⁷). To increase the yield of completeanswer sets, all initial oligonucleotides were purified before and againafter phosphorylation.

High-performance liquid chromatography (HPLC) coupled to indirectultraviolet detection was used for DNA purification. For oligonucleotidepurification, each DNA sample representing a city or pathway wasinjected into the buffer flow path of the HPLC system. The positivelycharged molecules in solution could associate with DNA sequences thathave the negatively charged phosphate backbone. This entire entitybehaved as a typical hydrophobic molecule and was attracted to theneutral (hydrophobic) beads located in the separation cartridge. Thewash buffer containing acetonitrile (ACN) was used to break thehydrophobic interaction between the DNA complex and the cartridge. Asthe ACN concentration increased over time, bridging capabilities of thepositive ions decreased and the DNA fragments were released from thecartridge. The UV detector measured the absorbance of the DNA fragmentsthat passed through it. Only the product corresponding to the main peakwas collected and condensed to high stock concentration by the thermalcentrifuge device. The purity of oligonucleotides prepared for furtherreaction was improved significantly.

The HPLC system was also used to separate phosphorylated DNA. Due to theaddition of phosphate groups, phosphorylated products are able to eluteearlier from the cartridge than the un-phosphorylated molecules.

The efficiency of the ligation reaction was improved by modifying thehybridization/ligation reaction into a three-day protocol. The reactionmixture was first heated to 92° for 4 minutes, following a programmedcooling at 1° per minute until 8°, then maintained overnight. On thesecond day that heating-cooling procedure was repeated and the overnightincubation was completed with additional fresh ligase and reactionbuffer. On the last day more fresh ligase and buffer were added into thesolution for overnight incubation. This method largely fostered theformation of answer sequences long enough that could represent thecorrect answers.

The answer readout profile of the 15-city problem is showed in FIG. 10.The pathways contained in the optimal solution were obvious in eachgroup of LCR reactions, enabling the optimal answer to be determined byinspection. The consistency between the final optimal answer calculatedby DNA computing and the initial design confirms the feasibility of thismodified method. Several minor bands presented at 40-mer positionindicates that besides the best answer, the DNA computer also producesome suboptimal answers showing that the computer was indeed samplingthe entire answer space.

In the 15-city TSP solved here, we assembled answer strands throughhybridization and ligation. DNA of the correct length (340mer) waspurified by PAGE, amplified, and then filtered by sequential magneticaffinity purification using the compliment to each city to ensure thateach city was visited once and only once. This resulted in a populationof DNA in which solutions to the 15 city problem occur in amountsproportional to their optimality. The optimal answer was determinedusing a series of ligation chain reactions, where each reaction testedevery possible city-city ordered pair. Successful ligation resulted in a40mer, which indicated how often one city directly preceded another. Theligation products for each ordered city pair were separated by PAGE,where each gel tested a complete set of ordered pairs where thepreceding city remained constant (e.g. A→B, A→C, A→D, etc.).

Taken together, the results of all the gels confirm that the optimalsolution occurred in alphabetical order (e.g. A→B, B→C, C→D, etc.),showing that the optimal answer is easily determined by inspection (FIG.11). Other pairings were observed in smaller abundance which indicatedthat the entire answer space was sampled during the computation. Theseresults demonstrate that interactions of DNA molecules can be used togenerate a random sample of a population by solving a problem with1.3*10¹² possible solutions.

WORK CITED

-   Adleman, L. M. (1994). “Molecular computation of solutions to    combinatorial problems.” Science 266(5187): 1021-4.-   Hartmanis, J. (1995). “Response to the Essays on    Computational-Complexity and the Nature of Computer-Science.” Acm    Computing Surveys 27(1): 59-61.-   Lee, J. Y., S. Y. Shin, et al. (2004). “Solving traveling salesman    problems with DNA molecules encoding numerical values.” Biosystems    78(1-3): 39-47.-   Tanaka, F., A. Kameda, et al. (2005). “Design of nucleic acid    sequences for DNA computing based on a thermodynamic approach.”    Nucleic Acids Res 33(3): 903-11.-   Yamamoto, M., A. Kameda, et al. (2002). “A separation method for DNA    computing based on concentration control.” New Generation Computing    20(3): 251-261.

1. A method for generating a distribution of optimal solutions to a nondeterministic polynomial optimization problem comprising: (a) providing n input polynucleotides, wherein n equals a number of data inputs, and wherein each input polynucleotide: (i) represents a unique data input; and (ii) comprises an x segment and a y segment; (b) providing z connection polynucleotides, wherein z equals a number of unique connections that can be made between the different data inputs, and wherein each polynucleotide in the set of connection polynucleotides is complementary to the x segment of one input polynucleotide and to they segment of one different input polynucleotide; (c) combining the input polynucleotides with the connection polynucleotides to form a mixture, wherein the combining is done under conditions to promote formation of hybridization complexes between complementary polynucleotides, and wherein each individual connection polynucleotide is added at a concentration based on a weight assigned to the individual connection polynucleotide; (d) ligating input polynucleotides that are present in a hybridization complex to form ligation products; and (e) determining a concentration of the ligation products, wherein the ligation products present at the highest concentration represent optimal solutions to the nondeterministic polynomial optimization problem.
 2. The method of claim 1, wherein step (c) comprises: (i) combining all input polynucleotides only with those connection polynucleotides that are complementary to the x or y segment of a starting input polynucleotide to form a mixture, wherein the combining is done under conditions to promote hybridization between complementary polynucleotides to form a first hybridization complex between the starting input polynucleotide, a second input polynucleotide, and one connection polynucleotide; and (ii) combining all remaining connection polynucleotides not combined in step (i) with the mixture, wherein the combining is done under conditions to promote hybridization between complementary polynucleotides, wherein the remaining connection polynucleotides are added at a concentration based on a weight assigned to each individual remaining connection polynucleotide.
 3. The method of claim 1, wherein step (c) comprises: (i) combining all input polynucleotides only with those connection polynucleotides that are complementary to the x or y segment of at least two, but less than all, of the input polynucleotides, to form a mixture, wherein the combining is done under conditions to promote formation of hybridization complexes between complementary polynucleotides, and wherein each individual connection polynucleotide is added at a concentration based on a weight assigned to the individual connection polynucleotide; and (ii) combining all remaining connection polynucleotides not combined in step (i) with the mixture, wherein the combining is done under conditions to promote hybridization between complementary polynucleotides, wherein the remaining connection polynucleotides are added at a concentration based on a weight assigned to each individual remaining connection polynucleotide.
 4. The method of claim 2, wherein step (c)(ii) is repeated a desired number of times.
 5. The method of claim 3, wherein step (c)(ii) is repeated a desired number of times.
 6. The method of any claim 1, wherein the input polynucleotides are present in saturating concentration relative to the connection polynucleotides.
 7. The method of claim 1, wherein determining a concentration of the ligation products comprises determining a length of the ligation products.
 8. The method of claim 1, wherein the method comprises purifying those ligation products that contain each input polynucleotide prior to determining a concentration of the ligation products.
 9. The method of claim 1, wherein determining a concentration of the ligation products comprises determining an order of polynucleotides in the ligation products.
 10. The method of claim 9, wherein the detecting produces a reduced distance matrix with nonzero values only for those ligation products that exist in an optimal answer set.
 11. The method of claim 1, wherein the nondeterministic polynomial optimization problem is selected from the group consisting of evacuation planning, invasion response planning, supply chain problems, computer chip assembly, shortest path problems, graph theory problems, network design problems, sets and partitions problems, storage and retrieval problems, sequencing and scheduling problems, mathematical programming problems, algebra and number theory problems, and program optimization problems.
 12. A computer readable storage medium comprising a set of instructions for causing a processing device to execute procedures for carrying out the method of claim
 1. 